OSINT is an indispensable tool to identify weaknesses within your organization to mitigate data breaches and uncover leaked data before bad actors do.
Cyberattacks have become a norm in today's complex digital threat landscape. Almost every week, we hear about a significant cyber incident where threat actors compromise the confidential data of some companies. A data breach can cause an organization a significant loss in money and reputation. For instance, according to IBM's latest data breach report, a single data breach cost in 2023 was USD 4.45 million, a 15% increase over three years.
OSINT tools and techniques can be essential in discovering breached and leaked information about a specific entity, whether an individual or a company. This post will explain how OSINT can be used to discover breached and leaked data online. However, before we begin, it is worth differentiating between the terms "data breach" and "data leak."
Data breach vs. data leak
A data breach occurs when threat actors access or steal sensitive information. Cyberattacks can allow threat actors to infiltrate target networks by exploiting security control vulnerabilities, enabling data breaches by bypassing authentication mechanisms. On the other hand, a data leak occurs when sensitive data are exposed to the public by mistake. Regardless of how the sensitive information was exposed, we would use the same OSINT techniques to capture them.
How OSINT can help in discovering leaked and breached data
There are different tools and techniques OSINT gatherers can leverage to detect sensitive breached and leaked information online. Here are the most useful ones:
The Wayback Machine
This is the first service we should use to detect leaked information online. Wayback Machine keeps a massive archive of nearly all websites worldwide. For instance, we can check whether the previous versions of the target company website reveal sensitive information — such as the contact information of employees/suppliers or whether some web folders were unprotected with a password in the past.
Data breach search engines
A data breach search engine enables us to search across a large number of previous data breaches and across the dark web to discover whether our online account information (e.g., username, email, phone number) was a part of a previous data breach. This enables us to take immediate measures to stop the breach and cancel or deactivate the compromised accounts.
There are different services for finding breached information; the most popular ones are:
- have i been pwned?
- Dehashed
- Leakpeek (see Figure 1)
- F‑Secure Identity Theft Checker
- Avast Hack Check
- Norton Dark Web Monitoring
FIG 1 | Data breach search engines are effective in searching within billions of leaked records to discover whether your information is leaked or included in a previous data breach
Search pastebin and files hosting websites
Hackers often use pastebin websites to share data from breaches and leaks anonymously. They may post samples or entire dumps of compromised data on these sites. If email addresses from a specific company appear in a pastebin data dump, it can be a strong indicator that the company suffered from a data breach. Here are popular pastebin websites.
- Pastebin
- OSU Open Source Lab Pastebin
- justpaste.it
- deone
- paste2
- PSBDMP: enter the email address you want to search for after the URL
- Paste Sources: this is a Twitter account that tweets the presence of pastes containing breached data
But pastebin websites contain more than just leaked and breached information. For instance, some developers may share code snippets while developing web applications on pastebin websites. The shared code could reveal important information about target company web applications that hackers could exploit to gain entry points into the company internal systems.
For example, I found the following code file on the pastebin.com website: https://pastebin.com/raw/9wF9zKxX (see Figure 2). After analyzing the content, we can discover the following weaknesses in the shared code file:
- The password "admin" is hardcoded in the login() function
- The SQL code uses unparameterized queries, which could be vulnerable to SQL injection
- The data seems to be stored in the SQLite database without encryption
- Any logged-in user can delete rows of data
- User inputs like names/numbers don't seem to have validation
- Full names, contact info and addresses are stored, risking exposure of personally identifiable information if the data is compromised
After analyzing the code file, threat actors may exploit any discovered weakness to execute their attack.
FIG 2 | A code file shared on a Pastebin website can reveal important technical information
Data breach alerting service
Some online services allow users to find out if their personal information, such as online account username and/or email address, are compromised. The online service will monitor popular data breach repositories and notify the user when their personal details are found. Here are two popular free data breach alerting services; there are also many commercial services:
Search public files
Sensitive files containing usernames, passwords or contact information may sometimes be left unprotected and accessible online. OSINT techniques can be used to gather these exposed files. One such technique is using Google dorks. Here are some examples:
intitle:" index of" password
intitle:"index of" "/ftpusers
We can add the "site" operator to any of the above search queries to limit our search to a specific domain name.
OSINT -inurl:(htm|html|php|pls|txt) intitle:index.of “last modified” (mp4|wma|aac|avi|PDF|doc|docx|xlsx|xls)
Site:drive.google.com inurl:password
We can use AI technology to generate Google dorks using the following two services:
FIG 3 | Generate Google dorks using Claude.AI
Files metadata
File metadata could reveal important information about your company's IT infrastructure and employees. For instance, we can get important information about any company by inspecting their public files’ metadata, such as sales brochures, whitepapers and meeting agendas. File metadata may contain the following information about your company:
- Information about the file author: most PDF and MS Office files contain the author, corporate names and sometimes an email address of the creator
- Time stamp information: when the file was first created, updated and accessed
- Geographic location: some files, such as images and videos, could contain GPS coordination of the location where they were taken
- Capturing device model: we can get the camera model via EXIF data, which reveals the device type (smartphone, tablet or traditional camera), and sometimes we can get the device ID
- Edit history: if the document is PDF or MS Office, we can get editing history, which may reveal authors’ names
- Information about how the file was generated: for example, using a PDF driver or a scanner
By inspecting public files’ metadata, we can prevent threat actors from exploiting any sensitive information in metadata information to conduct malicious actions against our company.
There are some tools for searching for public files and extracting their associated metadata and other hidden information automatically.
- Metagoofil: an information-gathering tool that extracts metadata from public documents belonging to a target company
- FOCA: a Windows tool for finding metadata and hidden information in the documents it scans
- ExifTool: If you already have some image files that you want to view their associated EXIF metadata, this is the best tool
Note: while I usually recommend open-source tools, Authentic8 has a commercially available Silo Image Metadata Viewer offered in their managed attribution platform. This is a great option for those looking for built-in functionality within their research environment, and are concerned with maintaining anonymity and security.
Analyze social media posts
Monitoring target company employee's social media profiles can reveal excellent information. For example, suppose an employee posts a picture from their office during work on Facebook. Analyzing the image may disclose important information, such as:
- Office layout: knowing the target company office layout could assist attackers in crafting social engineering attacks
- Physical security measures in place: such as the presence of security guards, surveillance cameras and type of locks used to secure door entrance
- Type of IT infrastructure: analyzing the visible equipment such as desktops computers types, networking devices and server types if the image is taken in the server room
- Desk content: if the picture displays an employee desk, we can enlarge the image to see what is displayed on the computer screen, or read passwords and phone numbers written on sticky notes. There are some tools for enlarging images to read their contents more efficiently, such as: pixelied, imageresizer and AI Image Enlarger (the last one uses AI)
- Other employee information: such as an employee badge design and the information contained with it (e.g., employee department)
- Design and type of furniture: this could reveal information about the company’s culture and how employees spend their time during work, which can help attackers craft customized social engineering attacks
The information employees post on social media is not limited to personal photos. Employees discussing confidential future business plans regarding new partnerships, suppliers or market expansion can expose critical information before the company officially releases it. Other sensitive data like product launches, software vulnerabilities, customer data and financials shared on social media can also negatively impact businesses.
Analyzing recruitment websites
Current and previous employees post detailed information about their job duties (see Figure 4) on professional social media networks, such as LinkedIn and recruitment websites, to show their experience and increase their promotility when planning to move to another job. Threat actors can harvest such information to understand the type of IT infrastructure and security controls used in the target company.
FIG 4 | Checking target company employee profiles on LinkedIn could reveal important technical information about the target company IT infrastructure and security solutions
We can also reverse the process and read the previous job vacancies posted by the target company (see Figure 5) to gather information about its IT.
Fig 5 | Job announcements could reveal important information about the target company's IT infrastructure
Assess vulnerabilities using OSINT
We can use OSINT to assess target company security vulnerabilities such as using weak passwords and outdated components used in different software solutions across the company. We should also conduct the same assessment on all target company suppliers and partners who have strong business relationships with it. Some ways to use OSINT in this endeavor include the following:
- Review public code repositories like GitHub for exposed secrets such as passwords or configuration files
- Check WHOIS records and network scans for outdated software/services with known vulnerabilities
- Check expired domains and FTP servers for unprotected files containing sensitive information
- Assess the company's website for any sensitive information that should be removed
- Use Internet of Things (IoT) search engines, such as Shodan to discover vulnerable systems across your — and your close vendors’ — IT environments
- When using a third-party managed security provider, you should conduct regular audits of their IT systems to ensure they do not become entry points into your IT environment
As we saw, OSINT can be used effectively to acquire information that helps organizations mitigate data breaches and discover data leaks before threat actors can find and exploit them.